Coupled Auto-Enrollment and Speaker Identification Platform for Real-Time Applications

Nicolas Shu

2024-04-11

Intro

B.S. in Biochemistry

M.S. in Biomedical Engineering

Summer Internship

M.S. in Electrical and Computer Engineering

M.S. in Computer Science

Ph.D in Machine Learning

Intro

Let's talk about dementia

Intro

Negative Interactions increase risk factors for dementia by ~60%

Being excluded from activities

Receiving unsolicited advice

Others having unsympathetic or insensitve behaviors

Others failing to provide help

Wilson, Robert S et al. “Negative social interactions and risk of mild cognitive impairment in old age.” Neuropsychology vol. 29,4 (2015): 561-70. doi:10.1037/neu0000154

Intro

Number of Interactions may affect risk of MCI

High Number of Interactions

Low Number of Interactions

E.g. Men living alone ->

2x cognitive decline in 10 years

How can we monitor people?

Personally Check in

Hire a Care Giver

Technology

Intro

Technology

https://www.peoplemanagement.co.uk/article/1747153/one-in-seven-workers-say-employer-monitoring-has-increased-during-covid

Intro

Audio is capable of capturing

  • Vocal interactions
  • Dangerous events which manifest audio cues

Different modalities are capable of capturing different information

Intro

Microphones have been widely accepted in homes

Intro

1. Can identify new incoming speakers and re-identify them

2. Can operate in real-time in an online algorithm

Intro

How can we build a system that:

Problem Statement:

Intro

Few-Shot Speaker ID

Intro

I. Detection of New Classes

II. Identification of Speakers

audio

Is this a new

speaker?

yes

no

Identify speaker

Enroll / Register

new speaker    

k \in \{1, ..., K\}
k^* = \argmax P(k)

Speaker =

Speaker =

K'

Two Parts:

Few-Shot Spkr ID

Intro

Traditional Classification

Few-Shot Spkr ID

Intro

Prob. Graph. Models

Support Vector Machines

Neural Networks

Decision Trees

Traditional Classification

Few-Shot Spkr ID

Intro

Traditional Classification

Prob. Graph. Models

Support Vector Machines

Neural Networks

Decision Trees

Few-Shot Spkr ID

Intro

Traditional Classification

Few-Shot Spkr ID

Intro

Traditional Classification

But this requires a lot of data!

Few-Shot Spkr ID

Intro

Traditional Classification

Few-Shot Spkr ID

Intro

Traditional Classification

Learn how to do a task well

Few-Shot Classification

(Meta-Learning)

Learn how to learn tasks well

Few-Shot Spkr ID

Intro

Traditional Classification

Learn how to do a task well

Few-Shot Classification

(Meta-Learning)

Learn how to learn tasks well

Few-Shot Spkr ID

Intro

Traditional Classification

Learn how to do a task well

Few-Shot Classification

(Meta-Learning)

Learn how to learn tasks well

Traditional Speaker Identification

Few-Shot Spkr ID

Intro

Traditional Classification

Learn how to do a task well

Few-Shot Classification

(Meta-Learning)

Learn how to learn tasks well

Traditional Speaker Identification

Few-Shot Spkr ID

Intro

Few-Shot Classification for Speaker Identification is good!

Learn how to learn tasks well

Few-Shot Spkr ID

Intro

Are there algorithms for speaker identification?

X-Vector Networks

Input

Layer 1

Layer 2

Layer 3

Layer 4

Layer 5

Layer 6

Layer 7

Layer 8

x-vector

1

2

3

t-2

t-1

t

t+1

t+2

T

t

t

Time-Delay Neural Network

DNN

Stats Pooling

t-2

t-1

t

t+1

t+2

t-2

t-1

t

t+1

t+2

t-3

t+3

1

2

3

T

Our work:

0.5 secs audio

Few-Shot Spkr ID

Intro

... but the original x-vector system relies on traditional classification

 

What about few-shot learning?

 

How does few-shot learning work?

Prototypical Networks for Few-Shot Learning

Support Set

Query Set

Used to create prototypes

(i.e. centroids)

Used for training

N_S = 3
N_Q = 2

Randomly choose classes:

Episode 1

Few-Shot Spkr ID

Intro

Prototypical Networks for Few-Shot Learning

Support Set

Query Set

Used to create prototypes

(i.e. centroids)

Used for training

N_S = 3
N_Q = 2

Randomly choose classes:

Episode 1

Few-Shot Spkr ID

Intro

Prototypical Networks for Few-Shot Learning

Support Set

Query Set

Used to create prototypes

(i.e. centroids)

Used for training

N_S = 3
N_Q = 2

Randomly choose classes:

Episode 1

Few-Shot Spkr ID

Intro

Prototypical Networks for Few-Shot Learning

Support Set

Query Set

Used to create prototypes

(i.e. centroids)

Used for training

N_S = 3
N_Q = 2

Randomly choose classes:

Episode 1

Few-Shot Spkr ID

Intro

Prototypical Networks for Few-Shot Learning

Support Set

Query Set

Used to create prototypes

(i.e. centroids)

Used for training

N_S = 3
N_Q = 2

Randomly choose classes:

Episode 1

Few-Shot Spkr ID

Intro

Prototypical Networks for Few-Shot Learning

Randomly choose classes:

Support Set

Query Set

Used to create prototypes

(i.e. centroids)

Used for training

N_S = 3
N_Q = 2

Episode 2

Few-Shot Spkr ID

Intro

Prototypical Networks for Few-Shot Learning

Support Set

Query Set

Used to create prototypes

(i.e. centroids)

Used for training

N_S = 3
N_Q = 2

Randomly choose classes:

Episode 2

Few-Shot Spkr ID

Intro

Prototypical Networks for Few-Shot Learning

Support Set

Query Set

Used to create prototypes

(i.e. centroids)

Used for training

N_S = 3
N_Q = 2

Randomly choose classes:

Episode 2

Few-Shot Spkr ID

Intro

Prototypical Networks for Few-Shot Learning

Support Set

Query Set

Used to create prototypes

(i.e. centroids)

Used for training

N_S = 3
N_Q = 2

Randomly choose classes:

Episode 2

Few-Shot Spkr ID

Intro

Input

Layer 1

Layer 2

Layer 3

Layer 4

Layer 5

Layer 6

Layer 7

Layer 8

x-vector

t

t

Time-Delay Neural Network

DNN

Stats Pooling

t-2

t-1

t

t+1

t+2

t-2

t-1

t

t+1

t+2

t-3

t+3

1

2

3

T

X-Vector System as a Prototypical Network

Few-Shot Spkr ID

Intro

1

2

3

t-2

t-1

t

t+1

t+2

T

Layer 7

Layer 8

DNN

X-Vector System as a Prototypical Network

Input

Layer 1

Layer 2

Layer 3

Layer 4

Layer 5

Layer 6

x-vector

1

2

3

t-2

t-1

t

t+1

t+2

T

t

t

Time-Delay Neural Network

Stats Pooling

t-2

t-1

t

t+1

t+2

t-2

t-1

t

t+1

t+2

t-3

t+3

1

2

3

T

Few-Shot Spkr ID

Intro

X-Vector System as a Prototypical Network

Input

Layer 1

Layer 2

Layer 3

Layer 4

Layer 5

Layer 6

x-vector

1

2

3

t-2

t-1

t

t+1

t+2

T

t

t

Time-Delay Neural Network

Stats Pooling

t-2

t-1

t

t+1

t+2

t-2

t-1

t

t+1

t+2

t-3

t+3

1

2

3

T

Few-Shot Spkr ID

Intro

X-Vector System as a Prototypical Network

Input

Layer 1

Layer 2

Layer 3

Layer 4

Layer 5

Layer 6

x-vector

1

2

3

t-2

t-1

t

t+1

t+2

T

t

t

Time-Delay Neural Network

Stats Pooling

t-2

t-1

t

t+1

t+2

t-2

t-1

t

t+1

t+2

t-3

t+3

1

2

3

T

Euclidean Distance

Assumption:

The latent subspace creates features which have Gaussian-like characteristics

p_\theta(y=k| \boldsymbol{x}) = \frac{e^{dist(f_\theta(\boldsymbol{x}),\boldsymbol{c}_k)}}{\sum_{k'}e^{dist(f_\theta(\boldsymbol{x}),\boldsymbol{c}_k)}}

Show formula

Few-Shot Spkr ID

Intro

How would this setup perform under different _____?

Number of Classes/Speakers

x-vector dimension

512-dim

128-dim

16-dim

Number of Samples in Support/Query Sets

0.5s

Few-Shot Spkr ID

Intro

So let's talk about the training procedure!

Few-Shot Spkr ID

Intro

C_{train}
C_{valid}
C_{test}

Training x-vector system as a prototypical network

Dataset: VoxCeleb1

Few-Shot Spkr ID

Intro

C_{train}
C_{valid}

X-Vector

System

x-vectors

prototypical

loss

C_{test}

Training x-vector system as a prototypical network

Few-Shot Spkr ID

Intro

C_{train}
C_{valid}

X-Vector

System

x-vectors

C_{test}

Training x-vector system as a prototypical network

Few-Shot Spkr ID

Intro

So how did these "clustered" x-vectors looked like?

Few-Shot Spkr ID

Intro

So how did these "clustered" x-vectors looked like?

Few-Shot Spkr ID

Intro

So how did these "clustered" x-vectors looked like?

X-Vector System

Few-Shot Spkr ID

Intro

So how did these "clustered" x-vectors looked like?

X-Vector System

Few-Shot Spkr ID

Intro

So how did these "clustered" x-vectors looked like?

X-Vector System

Few-Shot Spkr ID

Intro

So how did these "clustered" x-vectors looked like?

X-Vector System

Few-Shot Spkr ID

Intro

So how did these "clustered" x-vectors looked like?

X-Vector System

Few-Shot Spkr ID

Intro

So how did these "clustered" x-vectors looked like?

X-Vector System

Few-Shot Spkr ID

Intro

Let's visually inspect whether x-vectors clustered

Few-Shot Spkr ID

Intro

Let's visually inspect whether x-vectors clustered

Few-Shot Spkr ID

Intro

Test Accuracy on Few-Shot Speaker Identification

Few-Shot Spkr ID

Intro

E.g. Reduction in computational footprint by 18%

Few-Shot Spkr ID

Intro

Few-Shot

Speaker ID

  • Reduction in computational footprint by 18%

  • Learn quickly within 2.5s of audio

Few-Shot

Speaker ID

Detecting New Classes

Few-Shot Spkr ID

Intro

C_{test}
C_{valid}
C_{train}

Detecting New Classes

Few-Shot Spkr ID

Intro

C_{test}
C_{valid}
C_{train}

speakers with > 41 mins

80%

10%

10%

S_{train}
S_{valid}
S_{test}
U_{valid}
U_{test}
S_{train}
S_{valid}
S_{test}
U_{valid}
U_{test}

X-Vector System

Detecting New Classes

Few-Shot Spkr ID

Intro

speakers with > 41 mins

80%

10%

Stats for Gaussians

\begin{align*} \mu &\in \mathbb{R}^{K \times F} \\ \Sigma &\in \mathbb{R}^{K \times F \times F} \end{align*}
S_{valid}

10%

S_{test}
S_{train}
U_{valid}
U_{test}

seen

seen

seen

Detecting New Classes

Few-Shot Spkr ID

Intro

speakers with > 41 mins

80%

10%

Stats for Gaussians

\begin{align*} \mu &\in \mathbb{R}^{K \times F} \\ \Sigma &\in \mathbb{R}^{K \times F \times F} \end{align*}

seen

seen

seen

S_{valid}

10%

S_{test}
S_{train}
U_{valid}
U_{test}

Detecting New Classes

Few-Shot Spkr ID

Intro

speakers with

< 41 mins

speakers with > 41 mins

80%

10%

Stats for Gaussians

\begin{align*} \mu &\in \mathbb{R}^{K \times F} \\ \Sigma &\in \mathbb{R}^{K \times F \times F} \end{align*}

seen

seen

seen

S_{valid}

10%

S_{test}
U_{train}
S_{train}
U_{valid}
U_{test}

Mahalanobis Distances

d_M(x | \mu, \Sigma) = \sqrt{(x-\mu)^T \Sigma^{-1} (x-\mu)}
\mathcal{N}(x|\mu, \Sigma) = \left[(2\pi)^D |\Sigma|\right]^{-\frac{1}{2}} e^{-\frac{1}{2} (x-\mu)^T \Sigma^{-1} (x-\mu)}
\mathcal{N}(x|\mu, \Sigma) = \left[(2\pi)^D |\Sigma|\right]^{-\frac{1}{2}} e^{-\frac{1}{2} d_M^2(x|\mu, \Sigma)}

Detecting New Classes

Few-Shot Spkr ID

Intro

speakers with

< 41 mins

speakers with > 41 mins

80%

10%

Stats for Gaussians

\begin{align*} \mu &\in \mathbb{R}^{K \times F} \\ \Sigma &\in \mathbb{R}^{K \times F \times F} \end{align*}

seen

seen

seen

S_{valid}

10%

S_{test}
U_{train}
S_{train}
U_{valid}
U_{test}

Mahalanobis Distances

\mathcal{N}(x|\mu, \Sigma) = \left[(2\pi)^D |\Sigma|\right]^{-\frac{1}{2}} e^{-\frac{1}{2} d^2_M(x|\mu, \Sigma)}
\propto e^{-\frac{1}{2} d_M^2(x|\mu, \Sigma)}
\sum_{i=1}^5 d_M^2(x_i|\mu_k, \Sigma_k)
\propto \left[\prod _{i=1}^5 \mathcal{N}\left(x_i | \mu_k, \Sigma_k\right) \right]^{-1}

Unseen

Seen

d_M^2(x_1 | \mu_k, \Sigma_k)
d_M^2(x_2 | \mu_k, \Sigma_k)
d_M^2(x_3 | \mu_k, \Sigma_k)
d_M^2(x_4 | \mu_k, \Sigma_k)
d_M^2(x_5 | \mu_k, \Sigma_k)

This greatly reduces computational resources

Detecting New Classes

Few-Shot Spkr ID

Intro

speakers with > 41 mins

80%

10%

Stats for Gaussians

\begin{align*} \mu &\in \mathbb{R}^{K \times F} \\ \Sigma &\in \mathbb{R}^{K \times F \times F} \end{align*}

seen

seen

seen

S_{valid}

10%

S_{test}
S_{train}
U_{valid}
U_{test}

Detecting New Classes

Few-Shot Spkr ID

Intro

\gamma

speakers with > 41 mins

80%

10%

Stats for Gaussians

\begin{align*} \mu &\in \mathbb{R}^{K \times F} \\ \Sigma &\in \mathbb{R}^{K \times F \times F} \end{align*}

seen

seen

seen

S_{valid}

10%

S_{test}
S_{train}
U_{valid}
U_{test}

Detecting New Classes

Few-Shot Spkr ID

Intro

\gamma
S_{train}
S_{valid}
U_{valid}
U_{test}

speakers with > 41 mins

80%

10%

seen

seen

seen

S_{test}

10%

Stats for Gaussians

\begin{align*} \mu &\in \mathbb{R}^{K \times F} \\ \Sigma &\in \mathbb{R}^{K \times F \times F} \end{align*}

Detecting New Classes

Few-Shot Spkr ID

Intro

\gamma
S_{train}
S_{valid}
U_{valid}
U_{test}

speakers with > 41 mins

80%

10%

seen

seen

seen

S_{test}

10%

Stats for Gaussians

\begin{align*} \mu &\in \mathbb{R}^{K \times F} \\ \Sigma &\in \mathbb{R}^{K \times F \times F} \end{align*}

?

?

?

Detecting New Classes

Few-Shot Spkr ID

Intro

\gamma
S_{train}
S_{valid}
U_{valid}

speakers with > 41 mins

80%

10%

seen

seen

seen

10%

Stats for Gaussians

\begin{align*} \mu &\in \mathbb{R}^{K \times F} \\ \Sigma &\in \mathbb{R}^{K \times F \times F} \end{align*}
U_{test}
S_{test}

?

\lessgtr \gamma

?

\lessgtr \gamma

?

\lessgtr \gamma

?

\begin{align*} &=5 \text{ x-vectors} \\ &\equiv 5\text{ audio segments} \end{align*}

Detecting New Classes

Few-Shot Spkr ID

Intro

\gamma
\gamma

Compute F1 scores

?

\begin{align*} &=5 \text{ x-vectors} \\ &\equiv 5\text{ audio segments} \end{align*}

Detecting New Classes

Few-Shot Spkr ID

Intro

S_{train}
S_{valid}
U_{valid}

speakers with > 41 mins

80%

10%

seen

seen

seen

10%

Stats for Gaussians

\begin{align*} \mu &\in \mathbb{R}^{K \times F} \\ \Sigma &\in \mathbb{R}^{K \times F \times F} \end{align*}
U_{test}
S_{test}

?

\lessgtr \gamma

?

\lessgtr \gamma

?

\lessgtr \gamma

F1 Score on Detection between Seen / Unseen Classes on Test Set

Detecting New Classes

Few-Shot Spkr ID

Intro

Detecting

New Classes

Detecting New Classes

Few-Shot Spkr ID

Intro

Few-Shot

Speaker ID

  • Created method to detect new classes based on few-shot learning clustering

  • Detection works under 2.5s of audio

Detecting

New Classes

Detecting New Classes

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Few-Shot

Speaker ID

VoxConverse Dataset

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

t-distributed Stochastic Neighbor Embeddings (t-SNE)

2D Data

3D Data

What about 4 dimensions? 6 dimensions? 32 dimensions?

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

t-distributed Stochastic Neighbor Embeddings (t-SNE)

What about 4 dimensions? 6 dimensions? 32 dimensions?

Our desired x-vectors have 32 dimensions!

 

We can use t-SNE to check for qualitatively indications that the clusters have been clustered

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

edixl

fkvvo

Bare t-SNE Projections of x-vectors

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

x-vectors

Is this a new

speaker?

yes

no

Identify speaker

Enroll / Register

new speaker    

k \in \{1, ..., K\}
k^* = \argmax P(k)

Speaker =

Speaker =

K'

This setup has many caveats!

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

x-vectors

Is this a new

speaker?

yes

no

Identify speaker

Enroll / Register

new speaker    

k \in \{1, ..., K\}
k^* = \argmax P(k)

Speaker =

Speaker =

K'

Prob 1: The system will not know the actual labels as it creates predicted labels

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Prob 1: The system will not know the actual labels as it creates predicted labels

Solution:

Matching with Hungarian Algorithm

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Matching Algorithms: Hungarian Algorithm

$10

$40

$50

$50

$80

$80

$50

$70

$60

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Matching Algorithms: Hungarian Algorithm

$10

$40

$50

$50

$70

$60

$50

$80

$80

Cost Matrix

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Matching Algorithms: Hungarian Algorithm

$10

$40

$50

$50

$70

$60

$50

$80

$80

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Matching Algorithms: Hungarian Algorithm

Prob 1: The system will not know the actual labels as it creates predicted labels

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Matching Algorithms: Hungarian Algorithm

Prob 1: The system will not know the actual labels as it creates predicted labels

There will be left overs classes  when using a Hungarian Alg

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Matching Algorithms: Greedy Algorithm

Prob 1: The system will not know the actual labels as it creates predicted labels

Greedy algorithms will use up every predicted class found!

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Prob 2: What if the detector has too many false positives?

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Prob 2: What if the detector has too many false positives?

I

L

L

L

L

K

K

J

J

A

C

C

B

B

B

B

B

D

D

E

E

E

F

G

G

H

This is VERY segmented!

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Prob 2: What if the detector has too many false positives?

I

L

L

L

L

K

K

J

J

A

C

C

B

B

B

B

B

D

D

E

E

E

F

G

G

H

Hungarian Algorithm

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Prob 2: What if the detector has too many false positives?

I

L

L

L

L

K

K

J

J

A

C

C

B

B

B

B

B

D

D

E

E

E

F

G

G

H

Greedy Algorithm

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Results will be displayed in this manner:

Time (s)

Speaker

true label

predicted label

true label

predicted label

true label

predicted label

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu
\Sigma

Create new

class

Classify as new cluster

Classify as closest cluster

Class

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

Experiment 1: Baseline

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Experiment 1: Baseline

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Hungarian

Greedy

Baseline

Results for entire test set

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Hungarian

Baseline

Conclusion:

  1. The threshold for detection of new speakers does not generalize across datasets
  2. We needed to learn a little about the transfer function

Results for entire test set

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Remember how in VoxCeleb1, we had 29 speakers with more than 41mins of audio?

speakers with > 41 mins

80%

10%

10%

S_{train}

Stats for Gaussians

\begin{align*} \mu &\in \mathbb{R}^{K \times F} \\ \Sigma &\in \mathbb{R}^{K \times F \times F} \end{align*}

seen

seen

unseen

unseen

seen

S_{valid}
S_{test}
U_{test}

29 Speakers

U_{valid}

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as new cluster

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

Class

\Sigma

Experiment 2: Using 41min Covariance as Model Cov

This experiment, although didn't work well, further corroborated our takeaways of no-free lunch

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Remember how in VoxCeleb1, we had 29 speakers with more than 41mins of audio?

speakers with > 41 mins

80%

10%

10%

S_{train}

Stats for Gaussians

\begin{align*} \mu &\in \mathbb{R}^{K \times F} \\ \Sigma &\in \mathbb{R}^{K \times F \times F} \end{align*}

seen

seen

unseen

unseen

seen

S_{valid}
S_{test}
U_{test}

29 Speakers

U_{valid}

Experiment 2: Using 41min Covariance as Model Cov

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as new cluster

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

Class

\Sigma

Speaker Identification

Sensor Localization

Intro

System Infrastructure

Remember how in VoxCeleb1, we had 29 speakers with more than 41mins of audio?

speakers with > 41 mins

29 Speakers

S_{train}

80%

Speaker Identification

Sensor Localization

Intro

System Infrastructure

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as new cluster

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

Class

\Sigma

Remember how in VoxCeleb1, we had 29 speakers with more than 41mins of audio?

speakers with > 41 mins

29 Speakers

S_{train}

80%

Speaker Identification

Sensor Localization

Intro

System Infrastructure

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as new cluster

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

Class

\Sigma
k=1
k=2
k=3
k=4

Remember how in VoxCeleb1, we had 29 speakers with more than 41mins of audio?

speakers with > 41 mins

29 Speakers

S_{train}

80%

Speaker Identification

Sensor Localization

Intro

System Infrastructure

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as new cluster

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

Class

\Sigma
k=1
k=2
k=3
k=4
\Sigma_1
\Sigma_2
\Sigma_3
\Sigma_4

Remember how in VoxCeleb1, we had 29 speakers with more than 41mins of audio?

speakers with > 41 mins

29 Speakers

S_{train}

80%

Speaker Identification

Sensor Localization

Intro

System Infrastructure

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as new cluster

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

Class

\Sigma
\Sigma_1
\Sigma_2
\Sigma_3
\Sigma_4

Remember how in VoxCeleb1, we had 29 speakers with more than 41mins of audio?

speakers with > 41 mins

29 Speakers

S_{train}

80%

Speaker Identification

Sensor Localization

Intro

System Infrastructure

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as new cluster

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

Class

\Sigma
\Sigma_1
\Sigma_2
\Sigma_3
\Sigma_4
median(
median(
)
)
=
\Sigma_{41min}

Experiment 2: Using 41min Covariance as Model Cov

Remember how in VoxCeleb1, we had 29 speakers with more than 41mins of audio?

speakers with > 41 mins

29 Speakers

S_{train}

80%

\Sigma_1
\Sigma_2
\Sigma_3
\Sigma_4
median(
median(
)
)
=
\Sigma_{41min}

Speaker Identification

Sensor Localization

Intro

System Infrastructure

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as new cluster

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

Class

\Sigma

Experiment 2: Using 41min Covariance as Model Cov

Baseline

Using 41min Covariance

Result comparison between

Speaker Identification

Sensor Localization

Intro

System Infrastructure

Hungarian Matching

Using 41min Covariance

Speaker Identification

Sensor Localization

Intro

System Infrastructure

Hungarian

Greedy

Using 41min Covariance

Speaker Identification

Sensor Localization

Intro

System Infrastructure

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as new cluster

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

Class

Covariance Adaptation

\Sigma

Experiment 3: 5s Initial Covariance Adaptation

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as new cluster

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

Class

Covariance Adaptation

\Sigma

xvec queue

\text{if } t < 5s

false

true

\text{if } t = 5s

true

false

Collect the

x-vectors

...

Collection

Experiment 3: 5s Initial Covariance Adaptation

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as new cluster

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

Class

Covariance Adaptation

\Sigma

xvec queue

\text{if } t < 5s

true

false

Collect the

x-vectors

\text{if } t = 5s

true

false

...

Collection

Trained Covariance

\Sigma^*

Train covariance

matrix on Collection

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as new cluster

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

Class

Covariance Adaptation

\Sigma

xvec queue

\text{if } t < 5s

true

false

Collect the

x-vectors

\text{if } t = 5s

true

false

...

Collection

Train covariance

matrix on Collection

Trained Covariance

\Sigma^*

Speaker Identification

Sensor Localization

Intro

System Infrastructure

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as new cluster

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

Class

Covariance Adaptation

\Sigma

xvec queue

\text{if } t < 5s

true

false

Collect the

x-vectors

\text{if } t = 5s

true

false

...

Collection

Train covariance

matrix on Collection

Trained Covariance

\Sigma^*

Experiment 3: 5s Initial Covariance Adaptation

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as new cluster

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

Class

Covariance Adaptation

\Sigma
\Sigma^*

Experiment 3: 5s Initial Covariance Adaptation

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as new cluster

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

Class

Covariance Adaptation

\Sigma
\Sigma^*

Experiment 3: 5s Initial Covariance Adaptation

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Baseline

5s Initial Covariance Adaptation

Result comparison between

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Hungarian Matching

5s Initial Covariance Adaptation

Results for entire test set

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Hungarian

Greedy

5s Initial Covariance Adaptation

Speaker Identification

Sensor Localization

Intro

System Infrastructure

Statistics as Dynamic Models (Algorithmic Statistics)

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Statistics as Dynamic Models (Algorithmic Statistics)

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Statistics as Dynamic Models (Algorithmic Statistics)

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Statistics as Dynamic Models (Algorithmic Statistics)

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Statistics as Dynamic Models (Algorithmic Statistics)

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Statistics as Dynamic Models (Algorithmic Statistics)

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Statistics as Dynamic Models (Algorithmic Statistics)

\mu_T = \frac{1}{T} \sum_{t=1}^{T} x_t
= \mu_{T-1} + \frac{1}{T} (x_T - \mu_{T-1})
\Sigma_{T} = \frac{1}{T-1} \sum_{t=1}^T (x_t - \mu_T) (x_t - \mu_T)^T
= \frac{T-2}{T-1} \Sigma_{t-1} + \frac{1}{T} (x_T - \mu_{T-1}) (x_T - \mu_{T-1}) ^T

Sample Mean at time T

Sample Covariance at time T

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as new cluster

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

\Sigma
k

Class

Experiment 4: Algorithmic Stats

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

\Sigma
k

Class

k

Class

Experiment 4: Algorithmic Stats

Update

\mu_k, \Sigma_k

Classify as new cluster

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Baseline

Algorithmic Statistics

Result comparison between

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Hungarian Matching

Algorithmic Statistics

Results for entire test set

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Hungarian

Greedy

Algorithmic Statistics

Speaker Identification

Sensor Localization

Intro

System Infrastructure

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as new cluster

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

\Sigma
k

Class

k

Class

Update

\mu_k, \Sigma_k

Experiment 5: 5s Cov. Adapt + Algorithmic Stats

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as new cluster

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

\Sigma
k

Class

k

Class

Update

\mu_k, \Sigma_k

Covariance Adaptation

\Sigma^*

Experiment 5: 5s Cov. Adapt + Algorithmic Stats

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as new cluster

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

\Sigma
k

Class

k

Class

Update

\mu_k, \Sigma_k

xvec queue

\text{if } t < 5s

true

false

Collect the

x-vectors

\text{if } t = 5s

true

false

...

Collection

Train covariance

matrix on Collection

Trained Covariance

\Sigma^*

Experiment 5: 5s Cov. Adapt + Algorithmic Stats

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as new cluster

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

\Sigma
k

Class

k

Class

Update

\mu_k, \Sigma_k

Covariance Adaptation

\Sigma^*

Experiment 5: 5s Cov. Adapt + Algorithmic Stats

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as new cluster

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

\Sigma
k

Class

k

Class

Update

\mu_k, \Sigma_k

Covariance Adaptation

\Sigma^*

Experiment 5: 5s Cov. Adapt + Algorithmic Stats

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Baseline

5s Cov. Adapt + Algorithmic Stats

Result comparison between

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Hungarian Matching

5s Cov. Adapt + Algorithmic Stats

Results for entire test set

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Hungarian

Greedy

5s Cov. Adapt + Algorithmic Stats

Speaker Identification

Sensor Localization

Intro

System Infrastructure

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as new cluster

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

\Sigma
k

Class

k

Class

Update

\mu_k, \Sigma_k

Covariance Adaptation

\Sigma^*

Experiment 6: 5s Cov. Adapt + Algorithmic Mean

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

x-vectors

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

xvec queue

xvec queue

\mu

Create new

class

Classify as new cluster

Classify as closest cluster

Compute joint Maha. dists to closest cluster

Mahalanobis Classifier

\Sigma
k

Class

k

Class

Covariance Adaptation

\Sigma^*

Update

\mu_k

Experiment 6: 5s Cov. Adapt + Algorithmic Mean

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Baseline

5s Cov. Adapt + Algorithmic Mean

Result comparison between

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Hungarian Matching

5s Cov. Adapt + Algorithmic Mean

Results for entire test set

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Hungarian

Greedy

5s Cov. Adapt + Algorithmic Mean

Speaker Identification

Sensor Localization

Intro

System Infrastructure

Results for entire test set

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

letter-value plots

Results for entire test set

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Results for entire test set

Speaker Identification

Sensor Localization

Intro

System Infrastructure

Detecting

New Classes

Zero-Shot & Adaptive ID

Few-Shot

Speaker ID

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

  • Identify and continually adapt vocal embeddings of voices it's never heard before

Detecting

New Classes

Zero-Shot & Adaptive ID

Few-Shot

Speaker ID

Real-Time Platform

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

  • Identify and continually adapt vocal embeddings of voices it's never heard before

Detecting

New Classes

Zero-Shot & Adaptive ID

Few-Shot

Speaker ID

Real-Time Platform

Sensor Localization

System Infrastructure

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Detecting

New Classes

Zero-Shot & Adaptive ID

Few-Shot

Speaker ID

Sensor Localization

System Infrastructure

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

Real-Time Platform

The Aware Home

Real-Time Platform

Detecting New Classes

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Finding Optimal Locations for Max. Coverage at Aware Home

Real-Time Platform

Speaker Identification

Sensor Localization

Intro

System Infrastructure

Intel i7-4770

4 cores / 8 threads

@ 3.40 GHz

June 2013

32 GB

Creating the Platform

Real-Time Platform

Speaker Identification

Sensor Localization

Intro

System Infrastructure

Recorder

Recorder

Recorder

Recorder

Creating the Platform

Real-Time Platform

Speaker Identification

Sensor Localization

Intro

System Infrastructure

Recorder

Recorder

Recorder

Recorder

Creating the Platform

Real-Time Platform

Speaker Identification

Sensor Localization

Intro

System Infrastructure

Recorder

Recorder

Recorder

Recorder

Creating the Platform

Real-Time Platform

Speaker Identification

Sensor Localization

Intro

System Infrastructure

Speaker

Recorder

Recorder

Recorder

Recorder

\Delta x

Register New Class/Speaker

Speaker = New Speaker

Speaker = Closest Cluster

\text{if } \Delta x > \gamma

true

false

Update

speaker distr.

Creating the Platform

Real-Time Platform

Speaker Identification

Sensor Localization

Intro

System Infrastructure

Speaker

Recorder

Recorder

Recorder

Recorder

\Delta x

Register New Class/Speaker

Speaker = New Speaker

Speaker = Closest Cluster

\text{if } \Delta x > \gamma

true

false

Update

speaker distr.

Creating the Platform

Real-Time Platform

Speaker Identification

Sensor Localization

Intro

System Infrastructure

Speaker

Recorder

Recorder

Recorder

Recorder

\Delta x

Register New Class/Speaker

Speaker = New Speaker

Speaker = Closest Cluster

\text{if } \Delta x > \gamma

true

false

Update

speaker distr.

Creating the Platform

Real-Time Platform

Speaker Identification

Sensor Localization

Intro

System Infrastructure

Speaker

Recorder

Recorder

Recorder

Recorder

\Delta x

Register New Class/Speaker

Speaker = New Speaker

Speaker = Closest Cluster

\text{if } \Delta x > \gamma

true

false

Update

speaker distr.

Front-End Dashboard

Creating the Platform

Real-Time Platform

Speaker Identification

Sensor Localization

Intro

System Infrastructure

Speaker

Recorder

Recorder

Recorder

Recorder

\Delta x

Register New Class/Speaker

Speaker = New Speaker

Speaker = Closest Cluster

\text{if } \Delta x > \gamma

true

false

Update

speaker distr.

Front-End Dashboard

Creating the Platform

Real-Time Platform

Speaker Identification

Sensor Localization

Intro

System Infrastructure

Speaker

Recorder

Recorder

Recorder

Recorder

\Delta x

Register New Class/Speaker

Speaker = New Speaker

Speaker = Closest Cluster

\text{if } \Delta x > \gamma

true

false

Update

speaker distr.

Front-End Dashboard

Creating the Platform

Real-Time Platform

Speaker Identification

Sensor Localization

Intro

System Infrastructure

x-vectors

Compute joint Maha. dists to closest cluster

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

\mu

Create new

class

Classify as closest cluster

Classify as new cluster

Mahalanobis Classifier

\Sigma

Class

k

Class

k

Update

Covariance Adaptation

\Sigma^*
\mu_k

Audio

x-vector system

Detection of Classes Method could be used for other applications

Compute joint Maha. dists to closest cluster

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

\mu

Create new

class

Classify as closest cluster

Classify as new cluster

Mahalanobis Classifier

\Sigma

Class

k

Class

k

Update

Covariance Adaptation

\Sigma^*
\mu_k

Inputs

Detection of Classes Method could be used for other applications

Detection of Classes Method could be used for other applications

Compute joint Maha. dists to closest cluster

joint Maha. dist

to closest cluster

\text{if } > \gamma

true

false

\mu

Create new

class

Classify as closest cluster

Classify as new cluster

Mahalanobis Classifier

\Sigma

Class

k

Class

k

Update

Covariance Adaptation

\Sigma^*
\mu_k

Inputs

Media Entertainment

Journalism

Object Recognition

Bioinformatics

Meetings

Data Mining

Detecting

New Classes

Zero-Shot & Adaptive ID

Few-Shot

Speaker ID

Real-Time Platform

Sensor Localization

System Infrastructure

Zero-Shot & Adaptive ID

Few-Shot Spkr ID

Intro

Detecting New Classes

\gamma

?

\lessgtr \gamma

?

\lessgtr \gamma

?

\lessgtr \gamma

seen

seen

seen

Speaker

\Delta x

Register New Class/Speaker

Speaker = New Speaker

Speaker = Closest Cluster

\text{if } \Delta x > \gamma

Update

speaker distr.

false

true

Front-End Dashboard

joint Maha. dists to closest cluster

\text{if } > \gamma

k = closest cluster

k = new cluster

Mahalanobis Classifier

k
k

Cov

Adapt

\mu_k

new class

F

T

\mu
\Sigma^*

Thank you so much for having me!

Intro

What is the difference between an Online Algorithm and an Offline Algorithm?

Offline Algorithm

Line of Best Fit

Online Algorithm

You have all the data from the beginning

You receive the data one piece at a time

Intro

Online Algorithm

Offline Algorithm

Line of Best Fit

You have all the data from the beginning

You can have all the data but load one piece of data at a time

Real-Time data

Intro

How was the framework for tessellation built?

Sensor Localization

Intro

Making a Framework for Max Coverage

There are a few steps that needed to be accomplished:

  1. Create a virtual environment
  2. Create agents based on a model
  3. Create a swarm based on agents

Sensor Localization

Intro

Creating Agents

Sensor Localization

Intro

Creating Agents

Sensor Localization

Intro

Creating Agents

Sensor Localization

Intro

Creating a Swarm

Sensor Localization

Intro

Creating a Swarm

Sensor Localization

Intro

Creating a Swarm

Sensor Localization

Intro

Creating a Swarm

\cap
=

Sensor Localization

Intro

Creating a Swarm

Sensor Localization

Intro

Creating a Swarm

Sensor Localization

Intro

Creating a Swarm

Sensor Localization

Intro

Creating a Swarm

Sensor Localization

Intro

Creating a Swarm

Sensor Localization

Intro

Creating a Swarm

Sensor Localization

Intro

Creating a Swarm

Sensor Localization

Intro

Creating a Swarm

Sensor Localization

Intro

Creating a Swarm

Sensor Localization

Intro

Creating a Swarm

Sensor Localization

Intro

Creating a Swarm

Sensor Localization

Intro

Can you further describe the networked controls implemented?

Sensor Localization

Intro

Lloyd's Algorithm works well for Simply Connected Environments

u_{i,L}

Sensor Localization

Intro

... but not so much for Non-Simply Connected Envs

Sensor Localization

Intro

u_{i,L}
u_{i,L}

& Good Initial Conds

Using Lloyd's Alg on EP6

Sensor Localization

Intro

Control: Avoid Obstacle

u_{i,ao} = \max \left[ -\log \left( \frac{\|\tilde{u}_{i,ao}\|}{b} \right),0 \right] \cdot \frac{\tilde{u}_{i,ao}}{\| \tilde{u}_{i,ao}}

Sensor Localization

Intro

\begin{aligned} u_{i,p} &= \sum_{j\in N_i} w_{ij} (x_i - x_j) \\ \frac{\partial \mathcal{E}_ij}{\partial x_i} &= w_{ij} (x_i - x_j) \end{aligned}
\begin{aligned} \mathcal{E}_{ij} &= e^{-\alpha \| x_i - x_j \|} \\ \frac{\partial \mathcal{E}_{ij}}{\partial x_{i}} &= e^{-\alpha \| x_i - x_j \|} \\ w_{ij} &= - \frac{e^{-\alpha \| x_i - x_j \|}}{e^ \| x_i - x_j \|} \end{aligned}
u_{i,p} = \sum _{j \in N_i} \underbrace{\frac{-e^{\alpha \|x_i - x_j \|}}{\| x_i - x_j \|}}_{w_{ij}} \cdot (x_i - x_j)

Control: Distance oneself from other Agents

Sensor Localization

Intro

Control:

u_i = k_L u_{i,L} + k_p u_{i,p} + k_{ao} u_{i,ao}

Sensor Localization

Intro

Control: Increase Coverage via Largest Boundary

Assumption:

Large hallway cross-sections lead to larger rooms than small hallway cross-sections

u_{i,b}

Sensor Localization

Intro

Control:

u_i = k_L u_{i,L} + k_p u_{i,p} + k_{ao} u_{i,ao}

Sensor Localization

Intro

Further Results

Sensor Localization

Intro

Can you describe your audiosockets package in more detail?

Sensor Localization

Intro

System Infrastructure

Python is a linear programming system

  • Synchronous
  • sounddevice package

Sensor Localization

Intro

System Infrastructure

Sensor Localization

Intro

System Infrastructure

Sensor Localization

Intro

System Infrastructure

Sensor Localization

Intro

System Infrastructure

Sensor Localization

Intro

System Infrastructure

How can I use it?

{
   "PORT": 5050,
   "HEADER": 64,
   "FORMAT": "utf-8",
   "DISCONNECT_MSG": "DISCONNECT",
   "logging_format": "%(asctime)s - %(message)s",
   "logging_level": "info"
}

1. Server Descriptor

Sensor Localization

Intro

System Infrastructure

How can I use it?

{
   "PORT": 5050,
   "HEADER": 64,
   "FORMAT": "utf-8",
   "DISCONNECT_MSG": "DISCONNECT",
   "logging_format": "%(asctime)s - %(message)s",
   "logging_level": "info"
}

1. Server Descriptor

from audiosockets import MailmanSocket

mailman = MailmanSocket("server_info.json")
mailman.start()

2. Start up a server

Sensor Localization

Intro

System Infrastructure

How can I use it?

{
   "PORT": 5050,
   "HEADER": 64,
   "FORMAT": "utf-8",
   "DISCONNECT_MSG": "DISCONNECT",
   "logging_format": "%(asctime)s - %(message)s",
   "logging_level": "info"
}

1. Server Descriptor

from audiosockets import MailmanSocket

mailman = MailmanSocket("server_info.json")
mailman.start()

2. Start up a server

Sensor Localization

Intro

System Infrastructure

How can I use it?

{
   "PORT": 5050,
   "HEADER": 64,
   "FORMAT": "utf-8",
   "DISCONNECT_MSG": "DISCONNECT",
   "logging_format": "%(asctime)s - %(message)s",
   "logging_level": "info"
}

1. Server Descriptor

from audiosockets import MailmanSocket

mailman = MailmanSocket("server_info.json")
mailman.start()

2. Start up a server

3. Start Recorder Client

from audiosockets import RecorderSocket

recorder = RecorderSocket("server_info.json")
recorder.start()

Sensor Localization

Intro

System Infrastructure

How can I use it?

{
   "PORT": 5050,
   "HEADER": 64,
   "FORMAT": "utf-8",
   "DISCONNECT_MSG": "DISCONNECT",
   "logging_format": "%(asctime)s - %(message)s",
   "logging_level": "info"
}

1. Server Descriptor

from audiosockets import MailmanSocket

mailman = MailmanSocket("server_info.json")
mailman.start()

2. Start up a server

3. Start Recorder Client

from audiosockets import RecorderSocket

recorder = RecorderSocket("server_info.json")
recorder.start()

Sensor Localization

Intro

System Infrastructure

How can I use it?

{
   "PORT": 5050,
   "HEADER": 64,
   "FORMAT": "utf-8",
   "DISCONNECT_MSG": "DISCONNECT",
   "logging_format": "%(asctime)s - %(message)s",
   "logging_level": "info"
}

1. Server Descriptor

from audiosockets import MailmanSocket

mailman = MailmanSocket("server_info.json")
mailman.start()

2. Start up a server

3. Start Recorder Client

from audiosockets import RecorderSocket

recorder = RecorderSocket("server_info.json")
recorder.start()

4. Start a Processor

from audiosockets import BaseProcessorSocket
from audiosockets.utils import LogMelSpectrogram

class LogMelSpecProcessor(BaseProcessorSocket):
   def __init__(self,*args, **kwargs):
       super().__init__(*args, **kwargs)

   def process_data(self,data):
       fs = data["fs"]
       audio = data["data"]
       lms = LogMelSpectrogram(fs)(audio)
       print(lms.shape)

processor = LogMelSpecProcessor("VAD", "server_info.json")
processor.start()

Sensor Localization

Intro

System Infrastructure

How can I use it?

{
   "PORT": 5050,
   "HEADER": 64,
   "FORMAT": "utf-8",
   "DISCONNECT_MSG": "DISCONNECT",
   "logging_format": "%(asctime)s - %(message)s",
   "logging_level": "info"
}

1. Server Descriptor

from audiosockets import MailmanSocket

mailman = MailmanSocket("server_info.json")
mailman.start()

2. Start up a server

3. Start Recorder Client

from audiosockets import RecorderSocket

recorder = RecorderSocket("server_info.json")
recorder.start()

4. Start a Processor

from audiosockets import BaseProcessorSocket
from audiosockets.utils import LogMelSpectrogram

class LogMelSpecProcessor(BaseProcessorSocket):
   def __init__(self,*args, **kwargs):
       super().__init__(*args, **kwargs)

   def process_data(self,data):
       fs = data["fs"]
       audio = data["data"]
       lms = LogMelSpectrogram(fs)(audio)
       print(lms.shape)

processor = LogMelSpecProcessor("VAD", "server_info.json")
processor.start()

Can we visualize the distances to validate our expectations?

Speaker Identification

Sensor Localization

Intro

System Infrastructure

Let's visualize the distances to validate our expectations

Speaker Identification

Sensor Localization

Intro

System Infrastructure

S_{test}

Let's visualize the distances to validate our expectations

Speaker Identification

Sensor Localization

Intro

System Infrastructure

S_{test}

Let's visualize the distances to validate our expectations

Speaker Identification

Sensor Localization

Intro

System Infrastructure

S_{test}

Let's visualize the distances to validate our expectations

Speaker Identification

Sensor Localization

Intro

System Infrastructure

U_{test}
S_{test}

Let's visualize the distances to validate our expectations

Speaker Identification

Sensor Localization

Intro

System Infrastructure

U_{test}
S_{test}

Speaker Identification

Sensor Localization

Intro

System Infrastructure

U_{test}
S_{test}

Visually inspecting the Mahalanobis distances from Seen/Unseen Test Data

Speaker Identification

Sensor Localization

Intro

System Infrastructure

What happens if we vary the number of speakers enrolled?

Speaker Identification

Sensor Localization

Intro

System Infrastructure

S_{train}
S_{valid}
U_{valid}
U_{test}

speakers with > 41 mins

80%

10%

seen

seen

seen

S_{test}

10%

Stats for Gaussians

\begin{align*} \mu &\in \mathbb{R}^{K \times F} \\ \Sigma &\in \mathbb{R}^{K \times F \times F} \end{align*}

unseen

unseen

?

?

?

What happens if we vary the number of speakers enrolled?

29 Speakers

Speaker Identification

Sensor Localization

Intro

System Infrastructure

What happens if we vary the number of speakers enrolled?

Speaker Identification

Sensor Localization

Intro

System Infrastructure

EER on Detection between Seen / Unseen Classes

Speaker Identification

Sensor Localization

Intro

System Infrastructure

Prob 3: What if the detector has too many false negatives?

Speaker Identification

Sensor Localization

Intro

System Infrastructure

Prob 3: What if the detector has too many false negatives?

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

C

C

B

B

B

B

B

B

B

B

Speaker Identification

Sensor Localization

Intro

System Infrastructure

Why did you train the covariance and set it as a trainable matrix?

Speaker Identification

Sensor Localization

Intro

System Infrastructure

Speaker Identification

Sensor Localization

Intro

System Infrastructure

By increasing every eigenvector by a constant...

Speaker Identification

Sensor Localization

Intro

System Infrastructure

By increasing every eigenvector by a constant...

Speaker Identification

Sensor Localization

Intro

System Infrastructure

By increasing every eigenvector individually...

Speaker Identification

Sensor Localization

Intro

System Infrastructure

By increasing every eigenvector individually...

Adaptation of Model Covariance

Brighter colors indicative of later stages

Speaker Identification

Sensor Localization

Intro

System Infrastructure